Some Technical Aspects about Aligning Near Languages

نویسندگان

  • Lluís de Yzaguirre
  • Marta Ribas
  • Jorge Vivaldi
  • M. Teresa Cabré
چکیده

IULA at UPF has developed an aligner that benefits from corpus processing results to produce an accurate and robust alignment, even with noisy parallel corpora. It compares lemmata and part-of-speech tags of analysed texts but it has two main characteristics. First, apparently it only works for near languages and second it requires morphological taggers for the compared languages. These two characteristics prevent this technique from being used for any pair of languages. Whevener it its applicable, a high quality of results is achieved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Annotating Predicate-Argument Structure for a Parallel Treebank

Abstract We report on a recently initiated project which aims at building a multi-layered parallel treebank of English and German. Particular attention is devoted to a dedicated predicate-argument layer which is used for aligning translationally equivalent sentences of the two languages. We describe both our conceptual decisions and aspects of their technical realisation. We discuss some select...

متن کامل

Some Aspects about Seismology of 2012 August 11 Ahar-Vaezaghan (Azarbayjan, NW of Persia) Earthquakes Sequences

In 2012 August 11 (12:23 UTC) a moderate earthquake with MW=6.4 (USGS) occurred between Ahar and Varzaghan towns in Azarbayjan Province at northwest of Iran. After eleven minutes another earthquake shook the area with MW=6.2 (USGS). These consecutive earthquakes followed by intensive sequences of aftershocks whereas the strongest one had MW=5.3 (USGS). In data processing including depth modific...

متن کامل

Questions related to Bitcoin and other Informational Money

A collection of questions about Bitcoin and its hypothetical relatives Bitguilder and Bitpenny is formulated. These questions concern technical issues about protocols, security issues, issues about the formalizations of informational monies in various contexts, and issues about forms of use and misuse. Some questions are formulated in the more general setting of informational monies and near-mo...

متن کامل

Digital Talking Books in Multiple Languages and Varieties

This paper describes our work in digital talking book alignment, starting by our earlier efforts for the alignment of books in European Portuguese, and ending with the two challenges we are currently facing of aligning books in different varieties of Portuguese and aligning parallel books in different languages. Our alignment module proved robust enough for porting to other varieties of Portugu...

متن کامل

Deterministic Fuzzy Automaton on Subclasses of Fuzzy Regular ω-Languages

In formal language theory, we are mainly interested in the natural language computational aspects of ω-languages. Therefore in this respect it is convenient to consider fuzzy ω-languages. In this paper, we introduce two subclasses of fuzzy regular ω-languages called fuzzy n-local ω-languages and Buchi fuzzy n-local ω-languages, and give some closure properties for those subclasses. We define a ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000